Longest Common Subsequence
Let's solve the Longest Common Subsequence problem using Dynamic Programming.
Statement#
Suppose you are given two strings. You need to find the length of the longest common subsequence between these two strings.
A subsequence is a string formed by removing some characters from the original string, while maintaining the relative position of the remaining characters. For example, “abd” is a subsequence of “abcd”, where the removed character is “c”.
If there is no common subsequence, then return 0.
Let’s say you have the following two strings:
- “cloud”
- “found”
The longest common subsequence between these two strings is “oud”, which has a length of .
Constraints:
-
str1.length -
str2.length str1andstr2contain only lowercase English characters.
Examples#
No. | str1 | str2 | Length |
1 | "bald" | "bad" | 3 |
2 | "nocturnal" | "nick" | 2 |
3 | "card" | "tissue" | 0 |
Try it yourself#
Implement your solution in the following coding playground.
Note: If you clicked the “Submit” button and the code timed out, this means that your solution needs to be optimized in terms of running time.
Hint: Use dynamic programming and see the magic.
Solution#
We will first explore the naive recursive solution to this problem and then see how it can be improved using the Longest Common Substring dynamic programming pattern.
Naive approach#
A naive approach is to compare the characters of both strings based on the following rules:
-
If the current characters of both strings match, we move one position ahead in both strings.
-
If the current characters of both strings do not match, we recursively calculate the maximum length of moving one character forward in any one of the two strings i.e., we check if moving a character forward in either the first string or the second will give us a longer subsequence.
-
If we reach the end of either of the two strings, we return .
Let’s look at the following illustration to get a better understanding of the solution:
1 of 10
2 of 10
3 of 10
4 of 10
5 of 10
6 of 10
7 of 10
8 of 10
9 of 10
10 of 10
Let’s implement the algorithm as discussed above:
Note: Please observe that if you include the test case commented out in the driver code, the solution is likely to time out. Try to solve the larger problem using the dynamic programming solutions provided below and see the difference.
Time complexity#
The time complexity of the naive approach is , where and are the lengths of the two strings respectively. This is because, at each point, we can have a maximum of two possibilities: either move one character ahead in the first string or in the second string.
Space complexity#
As there are recursive calls, and each call stores its data on the stack, the space complexity of this approach is .
Optimized Solution using dynamic programming#
Now, let’s improve the recursive solution using top-down and bottom-up approaches.
Top-down solution#
The top-down solution, commonly known as the memoization technique, is an enhancement of the recursive solution. It overcomes the problem of calculating redundant solutions over and over again by storing them in an array. In the recursive approach, the following two variables kept changing:
-
The index
i, used to keep track of the current character instr1. -
The index
j, used to keep track of the current character instr2.
We will use a 2-D table, dp, with rows and columns to store the result at any given state. At any later time, if we encounter the same subproblem, we can return the stored result from the table with an lookup, instead of recalculating that subproblem.
Let’s look at the following illustration to get a better understanding of the solution:
1 of 45
2 of 45
3 of 45
4 of 45
5 of 45
6 of 45
7 of 45
8 of 45
9 of 45
10 of 45
11 of 45
12 of 45
13 of 45
14 of 45
15 of 45
16 of 45
17 of 45
18 of 45
19 of 45
20 of 45
21 of 45
22 of 45
23 of 45
24 of 45
25 of 45
26 of 45
27 of 45
28 of 45
29 of 45
30 of 45
31 of 45
32 of 45
33 of 45
34 of 45
35 of 45
36 of 45
37 of 45
38 of 45
39 of 45
40 of 45
41 of 45
42 of 45
43 of 45
44 of 45
45 of 45
Let’s implement the algorithm as discussed above:
Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.
Time complexity#
Let’s think of this in terms of the keyspace mapped by the tuple of i and j. For strings of length and , i can go from to , while j can go from to . Therefore, the total number of unique subproblems to evaluate and store are . So, the time complexity of this approach is reduced to because we avoid redundant calculations by storing all the intermediate results in a 2-D array.
Space complexity#
We now need space in memory to store intermediate values in the table.
Bottom-up solution#
The bottom-up solution, also known as the tabulation technique, is an iterative method of solving dynamic problems. We first make a 2-D array of size , where is the length of str1 and is the length of str2. This array is initialized to . We need the first row and column to be for the base case. Any entry in this array given by dp[i][j] is the length of the longest common subsequence between str1 up to position and str2 up to the position.
As we saw in the recursive algorithm, when the characters matched, we could simply move one position ahead in both strings and add one to the result. This is exactly what we do here as well. We add to the previous diagonal result. In case we have a mismatch, we need to take the maximum of two subproblems:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
We could either move one position ahead in str1 (the subproblem dp[i-1][j]), or we could move one step ahead in str2 (the subproblem dp[i][j-1]). In the end, we have the optimal answer for str1 and str2 in the last position, i.e., dp[n][m].
Let’s look at the following illustration to get a better understanding of the solution:
1 of 14
2 of 14
3 of 14
4 of 14
5 of 14
6 of 14
7 of 14
8 of 14
9 of 14
10 of 14
11 of 14
12 of 14
13 of 14
14 of 14
Let’s implement the algorithm as discussed above:
Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.
Time complexity#
As we created a 2-D array to store the results of subproblems that would be used later on, therefore, the time complexity of this approach will also be .
Space complexity#
We need space in memory to store the intermediate values.
Can we do better?#
You must have noticed in the above algorithm that to fill up a row, we only require the previous row’s values; that is, for filling the row against the character, we require the values from the previous row representing the character. Therefore, there is no point in storing all the previous rows.
We can further improve this approach by using a 1-D array of length instead of creating a table. Next, we update this array for each element using the same calculation which we used earlier.
The time complexity would remain the same, , because we still have to do the calculation for each element. The space complexity, however, reduces to since we are only maintaining an array of the size .
Solving the Longest Common Substring Problem
Shortest Common Supersequence